36 research outputs found

    Yeast gene CMR1/YDL156W is consistently co-expressed with genes participating in DNA-metabolic processes in a variety of stringent clustering experiments

    Get PDF
    © 2013 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/3.0/, which permits unrestricted use, provided the original author and source are credited.The binarization of consensus partition matrices (Bi-CoPaM) method has, among its unique features, the ability to perform ensemble clustering over the same set of genes from multiple microarray datasets by using various clustering methods in order to generate tunable tight clusters. Therefore, we have used the Bi-CoPaM method to the most synchronized 500 cell-cycle-regulated yeast genes from different microarray datasets to produce four tight, specific and exclusive clusters of co-expressed genes. We found 19 genes formed the tightest of the four clusters and this included the gene CMR1/YDL156W, which was an uncharacterized gene at the time of our investigations. Two very recent proteomic and biochemical studies have independently revealed many facets of CMR1 protein, although the precise functions of the protein remain to be elucidated. Our computational results complement these biological results and add more evidence to their recent findings of CMR1 as potentially participating in many of the DNA-metabolism processes such as replication, repair and transcription. Interestingly, our results demonstrate the close co-expressions of CMR1 and the replication protein A (RPA), the cohesion complex and the DNA polymerases α, δ and ɛ, as well as suggest functional relationships between CMR1 and the respective proteins. In addition, the analysis provides further substantial evidence that the expression of the CMR1 gene could be regulated by the MBF complex. In summary, the application of a novel analytic technique in large biological datasets has provided supporting evidence for a gene of previously unknown function, further hypotheses to test, and a more general demonstration of the value of sophisticated methods to explore new large datasets now so readily generated in biological experiments.National Institute for Health Researc

    Clustering consistency in neuroimaging data analysis

    Get PDF
    Clustering techniques have been applied to neuroscience data analysis for decades. New algorithms keep being developed and applied to address different problems. However, when it comes to the applications of clustering, it is often hard to select the appropriate algorithm and evaluate the quality of clustering results due to the unknown ground truth. It is also the case that conclusions might be biased based on only one specific algorithm because each algorithm has its own assumption of the structure of the data, which might not be the same as the real data. In this paper, we explore the benefits of integrating the clustering results from multiple clustering algorithms by a tunable consensus clustering strategy and demonstrate the importance and necessity of consistency in neuroimaging data analysis

    Exploration of distance metrics in consensus clustering analysis of FMRI data

    Get PDF
    Clustering techniques have gained great popularity in neuroscience data analysis especially in analysing data from complex experiment paradigm where it is hard to apply traditional model-based method. However, when employing clustering analysis, many clustering algorithms are available nowadays and even with an individual clustering algorithm, choices like parameter settings and distance metrics are very likely to have impacts on the final clustering results. In our previous work, we have demonstrated the benefits of integrating clustering results from multiple clustering algorithms, which provides more stable, reproducible, and complete clustering solutions. In this paper, we aim to further inspect the possible influences from the choices of distance metrics in clustering analysis

    Paradigm of tunable clustering using binarization of consensus partition matrices (Bi-CoPaM) for gene discovery

    Get PDF
    Copyright @ 2013 Abu-Jamous et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.National Institute for Health Researc

    UNCLES: Method for the identification of genes differentially consistently co-expressed in a specific subset of datasets

    Get PDF
    Background: Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Results: Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. Conclusions: The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.The National Institute for Health Research (NIHR) under its Programme Grants for Applied Research Programme (Grant Reference Number RP-PG-0310-1004)

    SMART: Unique splitting-while-merging framework for gene clustering

    Get PDF
    Copyright @ 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.National Institute for Health Researc

    Cmr1/WDR76 defines a nuclear genotoxic stress body linking genome integrity and protein quality control

    Get PDF
    DNA replication stress is a source of genomic instability. Here we identify ​changed mutation rate 1 (​Cmr1) as a factor involved in the response to DNA replication stress in Saccharomyces cerevisiae and show that ​Cmr1—together with ​Mrc1/​Claspin, ​Pph3, the chaperonin containing ​TCP1 (CCT) and 25 other proteins—define a novel intranuclear quality control compartment (INQ) that sequesters misfolded, ubiquitylated and sumoylated proteins in response to genotoxic stress. The diversity of proteins that localize to INQ indicates that other biological processes such as cell cycle progression, chromatin and mitotic spindle organization may also be regulated through INQ. Similar to ​Cmr1, its human orthologue ​WDR76 responds to proteasome inhibition and DNA damage by relocalizing to nuclear foci and physically associating with CCT, suggesting an evolutionarily conserved biological function. We propose that ​Cmr1/​WDR76 plays a role in the recovery from genotoxic stress through regulation of the turnover of sumoylated and phosphorylated proteins

    In vitro downregulated hypoxia transcriptome is associated with poor prognosis in breast cancer

    Get PDF
    © The Author(s), 2017. Background Hypoxia is a characteristic of breast tumours indicating poor prognosis. Based on the assumption that those genes which are up-regulated under hypoxia in cell-lines are expected to be predictors of poor prognosis in clinical data, many signatures of poor prognosis were identified. However, it was observed that cell line data do not always concur with clinical data, and therefore conclusions from cell line analysis should be considered with caution. As many transcriptomic cell-line datasets from hypoxia related contexts are available, integrative approaches which investigate these datasets collectively, while not ignoring clinical data, are required. Results We analyse sixteen heterogeneous breast cancer cell-line transcriptomic datasets in hypoxia-related conditions collectively by employing the unique capabilities of the method, UNCLES, which integrates clustering results from multiple datasets and can address questions that cannot be answered by existing methods. This has been demonstrated by comparison with the state-of-the-art iCluster method. From this collection of genome-wide datasets include 15,588 genes, UNCLES identified a relatively high number of genes (>1000 overall) which are consistently co-regulated over all of the datasets, and some of which are still poorly understood and represent new potential HIF targets, such as RSBN1 and KIAA0195. Two main, anti-correlated, clusters were identified; the first is enriched with MYC targets participating in growth and proliferation, while the other is enriched with HIF targets directly participating in the hypoxia response. Surprisingly, in six clinical datasets, some sub-clusters of growth genes are found consistently positively correlated with hypoxia response genes, unlike the observation in cell lines. Moreover, the ability to predict bad prognosis by a combined signature of one sub-cluster of growth genes and one sub-cluster of hypoxia-induced genes appears to be comparable and perhaps greater than that of known hypoxia signatures. Conclusions We present a clustering approach suitable to integrate data from diverse experimental set-ups. Its application to breast cancer cell line datasets reveals new hypoxia-regulated signatures of genes which behave differently when in vitro (cell-line) data is compared with in vivo (clinical) data, and are of a prognostic value comparable or exceeding the state-of-the-art hypoxia signatures.Dr. Abu-Jamous would like to acknowledge the financial assistance from Brunel University London. Professors Buffa and Harris acknowledge support from Cancer Research UK, EU framework 7, and the Oxford NIHR Biomedical Research Centre. Professor Harris acknowledges support from the Breast Cancer Research Foundation. Professor Nandi would like to acknowledge that this work was partly supported by the National Science Foundation of China grant number 61520106006 and the National Science Foundation of Shanghai grant number 16JC1401300. The funding bodies have no role in the design of the study, in the collection, analysis, and interpretation of data, or in writing the manuscript

    Traditional knowledge of wild edible plants used in Palestine (Northern West Bank): A comparative study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A comparative food ethnobotanical study was carried out in fifteen local communities distributed in five districts in the Palestinian Authority, PA (northern West Bank), six of which were located in Nablus, two in Jenin, two in Salfit, three in Qalqilia, and two in Tulkarm. These are among the areas in the PA whose rural inhabitants primarily subsisted on agriculture and therefore still preserve the traditional knowledge on wild edible plants.</p> <p>Methods</p> <p>Data on the use of wild edible plants were collected for one-year period, through informed consent semi-structured interviews with 190 local informants. A semi-quantitative approach was used to document use diversity, and relative importance of each species.</p> <p>Results and discussion</p> <p>The study recorded 100 wild edible plant species, seventy six of which were mentioned by three informants and above and were distributed across 70 genera and 26 families. The most significant species include <it>Majorana syriaca, Foeniculum vulgare, Malvasylvestris</it>, <it>Salvia fruticosa, Cyclamen persicum, Micromeria fruticosa, Arum palaestinum, Trigonella foenum-graecum</it>, <it>Gundelia tournefortii</it>, and <it>Matricaria aurea</it>. All the ten species with the highest mean cultural importance values (mCI), were cited in all five areas. Moreover, most were important in every region. A common cultural background may explain these similarities. One taxon (<it>Majoranasyriaca</it>) in particular was found to be among the most quoted species in almost all areas surveyed. CI values, as a measure of traditional botanical knowledge, for edible species in relatively remote and isolated areas (Qalqilia, and Salfit) were generally higher than for the same species in other areas. This can be attributed to the fact that local knowledge of wild edible plants and plant gathering are more spread in remote or isolated areas.</p> <p>Conclusion</p> <p>Gathering, processing and consuming wild edible plants are still practiced in all the studied Palestinian areas. About 26 % (26/100) of the recorded wild botanicals including the most quoted and with highest mCI values, are currently gathered and utilized in all the areas, demonstrating that there are ethnobotanical contact points among the various Palestinian regions. The habit of using wild edible plants is still alive in the PA, but is disappearing. Therefore, the recording, preserving, and infusing of this knowledge to future generations is pressing and fundamental.</p

    Molecular signatures of the rediae, cercariae and adult stages in the complex life cycles of parasitic flatworms (Digenea: Psilostomatidae)

    Get PDF
    BACKGROUND: Parasitic flatworms (Trematoda: Digenea) represent one of the most remarkable examples of drastic morphological diversity among the stages within a life cycle. Which genes are responsible for extreme differences in anatomy, physiology, behavior, and ecology among the stages? Here we report a comparative transcriptomic analysis of parthenogenetic and amphimictic generations in two evolutionary informative species of Digenea belonging to the family Psilostomatidae. METHODS: In this study the transcriptomes of rediae, cercariae and adult worm stages of Psilotrema simillimum and Sphaeridiotrema pseudoglobulus, were sequenced and analyzed. High-quality transcriptomes were generated, and the reference sets of protein-coding genes were used for differential expression analysis in order to identify stage-specific genes. Comparative analysis of gene sets, their expression dynamics and Gene Ontology enrichment analysis were performed for three life stages within each species and between the two species.RESULTS: Reference transcriptomes for P. simillimum and S. pseudoglobulus include 21,433 and 46,424 sequences, respectively. Among 14,051 orthologous groups (OGs), 1354 are common and specific for two analyzed psilostomatid species, whereas 13 and 43 OGs were unique for P. simillimum and S. pseudoglobulus, respectively. In contrast to P. simillimum, where more than 60% of analyzed genes were active in the redia, cercaria and adult worm stages, in S. pseudoglobulus less than 40% of genes had such a ubiquitous expression pattern. In general, 7805 (36.41%) and 30,622 (65.96%) of genes were preferentially expressed in one of the analyzed stages of P. simillimum and S. pseudoglobulus, respectively. In both species 12 clusters of co-expressed genes were identified, and more than a half of the genes belonging to the reference sets were included into these clusters. Functional specialization of the life cycle stages was clearly supported by Gene Ontology enrichment analysis.CONCLUSIONS: During the life cycles of the two species studied, most of the genes change their expression levels considerably, consequently the molecular signature of a stage is not only a unique set of expressed genes, but also the specific levels of their expression. Our results indicate unexpectedly high level of plasticity in gene regulation between closely related species. Transcriptomes of P. simillimum and S. pseudoglobulus provide high quality reference resource for future evolutionary studies and comparative analyses
    corecore